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Abstract 

Building  a  knowledge  base  requires  iterative  refinement  to  correct  imperfections 
that  keep  lurking  after  each  new  version  of  the  system.  This  paper  concentrates  on  the 
automatic  refinement  of  incomplete  domain  models  for  planning  systems,  presenting 
both  a  methodology  for  addressing  the  problem  and  empirical  results  obtained  from  an 
implemented  system  in  several  domains  when  initial  domain  knowledge  is  up  to  50% 
incomplete.  Planning  knowledge  may  be  refined  automatically  through  direct  interac¬ 
tion  with  their  environment.  Missing  conditions  cause  unreliable  predictions  of  action 
outcomes.  Missing  effects  cause  unreliable  predictions  of  facts  about  the  state.  The 
paper  shows  that,  contrary  to  popular  belief,  missing  information  is  not  necessarily 
associated  with  execution  failures.  We  present  a  practical  approach  based  on  continu¬ 
ous  and  selective  interaction  with  the  environment  that  pinpoints  tlie  type  of  fault  in 
the  domain  knowledge  that  causes  any  unexpected  behavior  of  the  environment,  and 
resorts  to  experimentation  when  additional  inforiii-ition  is  needed  to  correct  the  fault. 
Our  approach  has  been  implemented  in  EXPO,  a  system  that  uses  PRODIGY  as  a 
baseline  planner  and  improves  its  domain  knowledge  in  several  domains.  The  empirical 
results  presented  show  that  EXPO  dramatically  improves  its  prediction  accuracy  and 
reduces  the  amount  of  unreliable  action  outcomes. 


1  Introduction 


Building  a  knowledge  base  is  a  process  that  requires  iteration  to  correct  e 
rors  that  keep  lurking  after  each  new  version  of  the  system.  Several  ty; 
of  imperfections  can  appear  simultaneously  in  any  type  of  domain  theory.  - 
eluding  incompleteness,  incorrectness,  and  intractability  [Mitchell  et  ai.  '  56: 
Rajamoney  and  DeJong,  1987;  Huffman  et  a/.,  1992).  In  an  EBL  system,  for 
example,  the  rules  of  the  theor}'  are  used  to  compose  explanations  and  an 
imperfect  theory  may  greatly  impair  the  system’s  ability  to  build  those  ex¬ 
planations.  In  fact,  EBL  systems  are  very  brittle  with  respect  to  errors  in 
the  domain  theory,  and  a  lot  of  the  research  in  EBL  concentrates  on  either 
correcting  them  or  making  the  system  more  robust  [Danyluk.  1991:  Hall.  1988; 
Rajamoney,  1988].  There  is  a  well  developed  framework  to  classify  these  errors 
and  understand  how  they  affect  the  explanation  process  [Mitchell  ct  ni.  1986: 
Rajamoney  and  DeJong,  19S7|. 

In  a  planning  system,  the  inaccuracies  of  the  knowledge  base  may  rend  prob¬ 
lems  unsolvable  or  produce  plans  that  yield  unsuccessful  executions.  However, 
there  is  not  a  good  basis  for  understanding  in  which  particular  ways  the  dif¬ 
ferent  types  of  faults  in  a  domain  theory  affect  the  planner's  performance. 
Exploring  this  issue  should  provide  a  good  framework  for  understanding  and 
evaluating  systems  that  learn  planning  domain  knowledge.  In  this  paper,  we 
concentrate  on  the  problematic  of  missing  domain  knowledge,  which  is  tech¬ 
nically  known  as  incompleteness.  Known  operators  may  be  missing  precon¬ 
ditions  and/or  effects,  or  entire  operators  may  be  absent  from  the  domain 
model.  VVe  describe  the  limitations  of  the  capabilities  of  a  planner  in  terms 
of  the  types  of  incompleteness  of  its  domain  knowledge.  The  imperfections  of 
the  domain  knowledge  have  been  closely  related  to  planning  and/or  execution 
failures  [Hammond,  1986;  Huffman  et  ai.  1992],  but  we  show  in  our  discussion 
that  this  is  not  necessarily  the  case. 

The  rest  of  this  paper  presents  a  summary  and  empirical  results  of  our  work 
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on  autonomous  refinement  of  incomplete  planning  domains  [Carbonell  and  Cdl, 
1990;  Gil,  1991a;  Gil,  1992].  Learning  is  selective  and  task-directed:  it  is  trig¬ 
gered  only  when  the  missing  knowledge  is  needed  to  achieve  the  task  at  hand. 
Our  approach  is  based  on  continuous  and  selective  interaction  with  the  envi¬ 
ronment  that  leads  to  identifying  the  type  of  fault  in  the  domain  knowledge 
that  causes  any  unexpected  behavior  of  the  environment,  and  resorts  to  ex¬ 
perimentation  when  additional  information  is  needed  to  correct  the  fault.  The 
new  knowledge  learned  by  experimentation  is  incorporated  into  the  domain 
and  is  immediately  available  to  the  planner.  The  planner  in  turn  provides 
a  performance  element  to  measure  any  improvements  in  the  knowledge  base. 
This  is  a  closed-loop  integration  of  planning  and  learning  by  experimentation. 
Research  in  the  area  of  acquiring  action  models  is  mostly  subsymbolic  [Ma- 
hadevan  and  Connell,  1992;  Maes,  1991).  An  important  component  of  our 
approach  is  the  ability  to  design  experiments  to  gather  additional  information 
that  is  not  available  to  the  learner  and  yet  is  needed  to  acquire  the  miss¬ 
ing  knowledge.  Experimentation  is  vital  for  effective  learning  and  is  a  very 
powerful  tool  to  refine  scientific  theories  [Cheng,  1990;  Rajamoney,  1988],  but 
current  research  on  learning  planning  knowledge  from  the  environment  does 
not  address  this  issue  directly  [Shen,  1989;  Kedar  ci  a/.,  1991]. 

The  approach  has  been  implemented  in  a  system  called  EXPO.  EXPO’s 
underlying  planning  architecture  is  the  PRODIGY  system  [Minton  et  ni.  1989: 
Carbonell  et  ai,  1991]  which  provides  a  robust,  expressive,  and  efficient  plan¬ 
ner.  The  examples  included  in  this  paper  are  based  on  a  robot  planning  domain 
[Gil,  1992],  but  results  are  also  shown  for  a  complex  process  planning  domain 
[Gil.  1991b]. 

The  paper  is  organized  as  follows.  Section  2  presents  a  taxonomy  of  how  in¬ 
complete  domain  knowledge  can  affect  the  performance  of  a  planning  system. 
Section  3  describes  our  approach  to  the  automatic  refinement  of  incomplete 
planning  domains  and  its  implementation  in  EXPO.  Finally,  the  empirical 


results  presented  in  Section  4  show  that  EXPO  dramatically  improves  its  pre¬ 
diction  accuracy  and  reduces  the  amount  of  unreliable  action  outcomes. 

2  Planning  with  Incomplete  Models 

This  section  groups  the  effects  of  incompleteness  in  planning  domains  in  three 
categories:  unreliable  action  outcomes,  unreliable  predicate  beliefs,  and  unre¬ 
liable  coverage  of  the  search  space. 

2.1  Unreliable  Action  Outcomes 

Suppose  that  a  planner  is  given  the  following  incomplete  operator: 

(OPEN' 

(pair2uns  (<doQr>)) 

(preconds 

(and 

(is-door  <door>) 

;the  condition  (unlocked  <door>)  is  missing 

(next-to  robot  <door>) 

(dr-closed  <door>) 

)) 

(effects  ( 

(del  (dr-closed  <door>)) 

(add  (dr-open  <door>)) 

))) 

OPEN’ is  incomplete;  it  is  missing  the  condition  (unlocked  <door>).  If  the 
planner  uses  OPEN’  to  open  an  unlocked  door,  the  execution  will  be  successful. 
If  the  planner  uses  OPEN’  to  open  a  door  that  happens  to  be  locked,  the  action 
will  have  no  effect.  In  this  Ccise,  the  planner  made  the  wrong  prediction  of  the 
outcome  of  the  action  execution;  that  the  door  would  be  open.  So  if  the 
preconditions  of  an  operator  are  incomplete,  the  planner’s  predictions  of  the 
operator’s  outcome  are  unreliable,  because  the  desired  effects  of  the  operator 
may  or  may  not  be  obtained.  The  success  or  failure  of  the  action's  e.xecution 
is  thus  beyond  the  planner's  control,  and  it  depends  solely  on  the  chances  that 
the  unknown  conditions  happen  to  be  true.  .Notice  that  an  execution  failure 
is  not  necessarily  obtained,  since  the  missing  conditions  may  happen  to  be 


Missing  conditions  of  context-dependent  effects  also  cause  unreliable  action 
outcomes,  since  the  planner  cannot  predict  when  the  effect  will  take  place. 

2.2  Unreliable  Predicate  Beliefs 

Consider  the  following  incomplete  operator: 

(PUTDOWS-NEXT-TQ* 

(params  (<ob>)) 

(preconds 

(and  (holding  <ob>) 

(next-to  robot  <other-ob>) ) ) 

(effects 

((add  (arm-empty)) 

;the  effect  (del  (holding  <ob>))  is  missing 
(add  (next-to  <ob>  <another-ob) ) ) ) ) 

If  the  planner  uses  this  action  to  put  down  an  object,  the  action's  execu¬ 
tion  will  be  reliable:  the  desired  effects  of  the  operator  will  be  obtained.  The 
planner  will  only  notice  the  change  in  the  status  of  holding  at  this  point  if 
it  is  monitoring  the  environment  beyond  the  known  effects.  Although  it  may 
be  possible  in  some  applications  [Kedar  et  ai,  199T,  Shen,  1989],  continu¬ 
ously  monitoring  the  status  of  all  the  known  facts  is  highly  impractical  in  real 
domains,  and  furthermore  it  is  not  very  cost-effective. 

However,  the  planner  may  notice  this  change  in  the  future.  Suppose  that  it 
continues  executing  actions  successfully.  Now  it  wants  to  pul  the  same  object 
down  again.  Since  it  believes  to  oe  still  hoiding  the  object  it  Lonsiders  this 
operator  to  put  the  object  down.  It  is  now  that  the  planner  notices  that  the 
truth  value  of  the  predicate  (holding  obj)  changed  inadvertently.  It  is  the 
truth  value  of  a  predicate  that  is  unreliable,  not  the  action's  outcome.  The 
action  of  putting  down  is  reliable  since  the  planner  can  predict  the  outcome 
of  the  action  for  any  object  that  it  is  holding. 

.Notice  that  although  the  planner's  prediction  of  the  truth  value  of  the  pred¬ 
icate  failed,  in  this  ca.se  the  planner  does  not  obtain  an  execution  failure.  A 
missing  effect  is  often  mistakenly  associated  with  an  execution  failure  [Ham¬ 
mond.  1986;  Huffman  (t  al..  I992j,  probably  because  of  its  negative  implica- 
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tioii;  the  planner  needs  to  patch  the  plan  and  achieve  the  desired  value  of  the 
predicate.  In  our  example,  holding  needs  to  be  reachieved.  However,  this  is 
not  necessarily  the  case.  Incomplete  effects  may  also  cause  the  (‘iirnination  of 
unnecessary  subplans  that  achieve  a  goal  that  is  already  satisfied  in  the  world, 
as  we  illustrate  in  the  following  example. 

Consider  the  following  operator: 

(PUTDOUM-HEXT-TO*  * 

(paxans  (<ob>)) 

(preconds 

(and  (holding  <ob>) 

(naxt-to  robot  <other-ob>) ) ) 

(eflects 

((add  (arm-empty)) 

(del  (holding  <ob>))))) 

;the  effect  (add  (next-to  <ob>  <another-ob) )  is  missing 

.\ow  suppose  that  the  goal  is  not  to  hold  a  key  and  to  have  it  next  to  a  cer¬ 
tain  box.  The  planner  uses  PUTDOWN-NEXT-TO"  to  achieve  not  holding 
the  key,  and  then  PUSH-OBJ  to  put  the  key  next  to  the  box.  The  planner  is 
unaware  that  PUTDOWN-NEXT-TO”  actually  achieves  both  subgoals,  and 
that  PUSH-OBJ  is  thus  an  unnecessary  subplan  (provided  that  the  subplan  is 
not  needed  to  achieve  other  goals).  When  the  planner  notices  that  the  truth 
value  of  next-to  was  changed  inadvertently,  it  can  eliminate  the  unneces¬ 
sary  subplan.  In  this  case,  the  nvreliable  pixdiction  did  not  hai'i  ani/  nef/attve 
implication  for  the  planner;  it  even  saved  some  extra  work. 

2.3  Unreliable  Coverage  of  Search  Space 

The  two  previous  sections  describe  how  missing  conditions  and  effects  ca.se 
undesirable  behavior  during  plan  exc«.,ution.  Incomplete  domains  may  also 
cause  unreliable  coverage  of  the  search  space.  .Xotice  that  this  would  cause 
complications  at  proldem  solving  time,  not  execution  time. 

Consider  t  he  case  of  a  missing  operator.  If  there  are  no  alternative  operators 
to  use  during  the  search,  then  problems  may  have  no  solution  U'ven  though 
they  would  be  solvable  if  the  complete  doniain  were  available  to  the  planner). 
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For  example,  if  0*  FN  is  missing  from  the  domain  then  no  other  operator  would 
achieve  the  ^jal  of  opening  a  door,  which  would  cause  all  the  problems  that 
include  this  subgoal  to  have  no  solution.  The  same  type  of  behavior  occurs  if 
I  he  missing  effects  of  an  operator  were  to  be  used  for  subgoaling.  Consider  for 
example  that  the  domain  included  an  operator  OPEN  that  is  missing  the  effect 
(add  (dr-open  <door>)).  Any  problem  that  causes  subgoaling  on  opening 
a  door  would  have  no  solution. 

Notice  that  in  the  previous  section  the  missing  effects  caused  different  com¬ 
plications.  They  did  not  preclude  the  operator  from  being  part  of  a  plan, 
since  some  other  known  effect  of  the  operator  allowed  its  u.se  for  subgoaling. 
So  as  long  as  some  primary  effect  of  each  operator  is  known  to  the  planner, 
the  missing  effects  could  be  detected  as  described  in  the  previous  si'ction. 

.Another  case  of  incompleteness  occurs  when  a  state  is  missing  facts  about 
the  world.  For  example,  consider  a  state  containing  a  description  of  a  door 
Door45  that  connects  Rooni4  and  RoomS.  The  state  does  not  contain  infor¬ 
mation  about  the  door  being  either  locked  or  unlocked.  In  this  case,  some 
operator’s  preconditions  cannot  be  matched  in  the  state.  For  example.  OPEN 
has  a  precondition  that  the  door  must  be  unlocked,  and  the  planner  cannot 
consider  using  it  for  opening  Door45.  So  when  facts  are  missing  from  the  state, 
the  api)licability  of  op<‘rators  is  restricted  to  the  known  facts  and  thus  it  may 
not  be  possible  to  explore  parts  of  the  search  space  \jntil  more  information 
becomes  available. 

2.4  Summary 

Figure  1  summarizes  the  taxonomy  of  limitations  of  a  planner  causeti  In'  incom¬ 
plete  domain  knowledge.  Missing  conditions  cause  action  execution  failures. 
If  the  missing  condition  is  idtuit ified.  a  plan  is  needed  to  achieve  it  before  the 
action  ran  be  ex('cuted  sinaessfully  .NFissing  si<le  <'lf('rts  may  cause  either  un 
necessary  subpians  or  additional  planning,  bu*  they  do  not  rau.se  execution 
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F'igure  1:  Limitations  Caused  by  Incomplete  Domain  Knowledge  in  a  Planner. 

lailure®.  Missing  primary  efTects.  operators,  or  data  about  the  state  may  cause 
that  some  problems  have  no  solution  (even  though  they  would  lx-  solvable  if 
the  complete  domain  were  available  to  the  planner). 

3  Incremental  Refinement  of  Planning  Domains 
through  Experimentation 

When  users  define  operators  for  a  planning  system,  the  resulting  operators 
turn  out  to  be  operational  for  planning  (i.e.,  the  planner  has  some  model  of 
the  actions  that  it  can  use  to  build  plans)  but  are  incomplete  in  that  users  often 
forget  to  include  unusual  preconditions  or  side  efTects.  This  section  presents 
our  approach  to  the  problem  of  refining  planning  domains  th..t  are  incomplete 
because  they  are  missing  operator's  ])reconditions  and  effects.  .More  details 
can  he  found  in  [Cil,  1992:  (lil.  1991a;  Carbonell  and  Gil.  1991)]. 

3.1  Detection  of  an  Imperfection 

.A  planner's  ability  to  interact  with  its  environment  allows  the  detection  of 
knowledge'  faults.  I'iiXPO  monitors  the  e.xternal  world  selectively  and  contin¬ 
uously.  Before  the  e.xecution  of  an  operator.  K.XPO  experts  the  operator's 
known  pix'conditions  to  be  satisfied,  so  it  check.':  them  in  the  external  world. 
If  they  are  indeed  satisfied,  then  HXPO  executes  the  corresponding  action. 
The  operator's  known  effects  are  now  expected  to  have  occurred  in  the  world. 


I 


•SO  EXPO  clu'cks  them  in  the  internal  world.  Any  time  that  the  observations 
disagree  with  the  expectations.  EXPO  signals  an  imperfection  and  learning 
triggered. 

3.2  Operator  Refinement 

EXPO  uses  the  Operator  Refinement  .Method  [Carbonell  and  Oil.  1990]  to 
learn  new  preconditions  and  effects  of  operators.  Ue  briefly  describe  now  the 
implementation  of  this  method  in  EXPO. 

Acquiring  New  Precorn|itions 

When  an  operator  O  executed  in  state  5  has  an  unpredicted  outcome.  EXPO 
considers  the  working  hypothesis  that  the  preconditions  <jf  ()  are  incomplete 
and  trigg(us  h'arning  to  find  out  the  missing  condition  ( ('  must  have  Ix'en 
true  (by  coincidence)  every  time  that  O  was  executed  before.  liXPO  keeps 
track  of  every  state  in  which  each  operator  is  executeil.  It  looks  tip  So.  a 
generalization  of  all  the  states  in  which  O  was  successfully  executed  in  the 
past.’  .-Ml  the  predicates  in  So  arc  considered  potential  preconditions  of  O. 
(Notice  that  the  currently  known  preconditions  of  0  must  be  in  So)-  EXPO 
then  engages  an  experimentation  proce.ss  to  di.scern  which  t)f  those  predicates 
is  the  mi.ssing  condition. 

Hecau.se  of  the  bias  needed  in  the  generalization  ol  .Sj,  the  missing  condi¬ 
tion  may  not  appear  in  So-  If  this  is  the  case,  none  of  the  experiments  wouhl 
be  successful.  E.XPO  then  would  retrieve  any  successful  past  application  of 
and  builds  a  new  set  of  candidate  preconditions  with  the  differences 
between  and  S^,,^.  If  experimentation  is  not  successful  in  this  stage,  the 
current  implementation  of  EXPO  prompts  the  user  for  help,  bleally.  it  would 
look  for  additional  candidates  (for  example.  )>redira1es  that  are  not  includetl 
in  the  state  becaus<*  tliev  w<'re  lu'Ver  observed),  and  even  consider  the  alter- 

'  riif  j!;i'n('ralizat ion  of  slalos  is  (ion<>  tliroiigli  tlio  oporalor's  hituliiu's  atui  iisrs  a  version 
>|)ar('  framework. 


native  working  hypothesis  that  O  has  conditional  effects  (instead  of  missing  a 
precondition). 

Previous  work  on  refinement  of  left-hand  sides  (LHS)  of  rules  ha.s  used  th< 
concept  learning  paradigm  in  considering  each  LHS  as  a  generalization  of  states 
where  the  rule  is  applicable  [Mitchell.  1978;  Mitchell  at  al..  198'1;  Langley, 
1987],  However,  EXPO  uses  this  paradigm  as  a  heuristic  that  guides  the 
search  for  a  new  condition,  and  not  as  a  basis  for  finding  it.  E.XPO  uses 
other  heuristics  to  make  the  e.xperimentation  process  more  efficient.  This  is 
described  in  detail  in  [Gil,  1991a;  Gil.  1992], 

Acquiring  New  Effects 

When  a  predicate  P  is  found  to  have  an  unpredicted  value.  EXPO  considers 
the  working  hypothesis  that  some  operator  that  was  applied  since  the  last 
time  P  was  observed  had  the  unknown  effect  of  changing  P.  EX  1^0  retrieves 
all  operators  executed  since  then,  and  considers  them  candidates  for  having 
incomplete  effects.  Experiments  with  each  operator  monitoring  P  closely  yield 
the  incomplete  operator. 

3.3  Summary 

Figure  2  summarizes  learning  by  experimentation  in  EXPO.  E.XPO  triggers 
learning  when  something  unpre<licted  happens,  and  focuses  on  experiments 
that  find  the  missing  information  that  yiekls  the  correct  ])redictioii.  Exper¬ 
imentation  is  task- directed:  always  engaged  within  a  j)articular  context  that 
sets  specific  aims  and  purpose  for  what  is  to  be  learned.  .See  [Gil.  1991a;  Gil. 
1992]  for  more  particulars  on  the  experiments  themselves. 

4  Empirical  Results 

This  sf'ction  contains  results  that  show  the  effectiveness  of  IC.XPO.  i.e..  that 
it  ran  indeed  l>e  used  to  actiuire  new  knowledge  that  is  nst'ful  to  t  lu'  |)roblem 
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Figure  ‘2;  Learning  by  Experimentation. 


The  results  presented  in  tliis  section  show  EIXPO  learnine;  in  two  diiferent 
domains:  a  robot  planning  domain  and  a  complex  process  planning  domain. 
The  robot  planning  domain  is  an  extension  of  the  one  used  by  S  I'KIPS  that  lias 
been  used  in  other  learning  re.search  in  PRODIGY  (see  [(’arbonell  it  al..  11191] 
for  references).  The  process  planning  domain  contains  a  large  body  of  knowl¬ 
edge  about  the  operations  necessary  to  machine  and  finish  metal  parts  [Cil. 
I99Ibj,  and  was  chosen  because  of  its  large  size.  The  domains  are  compared 
along  some  dimensions  in  Figure  3.  [Gil,  1992]  describes  them  in  detail. 
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Figure  3;  The  robot  planning  and  the  process  planning  domains. 


We  want  to  control  the  degree  of  incompleteness  of  a  domain  in  the  tests. 
We  have  available  a  complete  dmnain  D  which  has  all  the  operators  with  all 
their  corresponding  conditions  and  elfects.  With  this  complete  domain,  we 
can  artificially  i)roduc<'  domains  //  that  have  certain  percentage  of  incom- 
plet('ness  (i.e..  JO'X  of  the  |)recotnlitions  are  missing)  by  randomly  rf'inoving 
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preconditions  or  effects  from  D.  We  will  use  denote  a  domain  that  is 

incomplete  and  is  missing  20%  of  the  conditions.  ^  domain  missing 

20%  of  the  postconditions.  Notice  that  EXPO  never  has  access  to  D.  only  to 
some  incomplete  domain  D' . 

EXPO  learns  new  conditions  and  effects  of  incomplete  operators.  What 
is  a  good  measure  of  the  amount  of  new  knowledge  acquired  by  EXPO  in 
each  case?  Missing  preconditions  may  cause  action  execution  failures.  Fo 
show  that  EXPO  is  effectively  learning  new  preconditions,  we  run  the  tost 
set  several  times  during  training.  We  compared  the  cumulative  number  of 
wrong  predictions  during  training  with  the  number  of  problems  in  the  test 
set  that  could  be  executed  successfully  to  completion.  .Missing  effects  may 
cause  wrong  predictions  of  literals.  We  compared  the  ctimulative  number  of 
incorrect  literals  found  during  training  with  the  number  of  incorrect  literals 
in  the  final  state  of  the  problems  in  the  test  set.  Each  wrong  prediction 
encountered  during  training,  is  an  opportunity  for  learning.  .At  certain  points 
during  learning,  we  run  the  test  set.  Learning  is  turned  off  at  test  time,  so 
when  a  wrong  prediction  is  found  the  internal  state  is  corrected  to  reflect  the 
observations  but  no  learning  occurs. 

Training  set  and  test  set  were  generated  randomly,  and  they  were  indepen¬ 
dent  in  all  cases. 

4.1  Results 

Figures  1(a)  and  5(a)  show  the  number  of  action  execution  failures  that  EXPO 
detects  during  training  with  £?prec20  ^precso  respectively  in  the  robot  plan¬ 
ning  domain.  Figures  4(b)  and  5(b)  show  how'  many  solutions  for  problems  in 
the  test  set  were  successfully  executed  with  ^prrr.50  respectively. 

The  number  of  plans  that  PRODlCtY  is  able  to  execute  correctly  increases  with 
learning. 

I  he  niaximuin  number  of  unexpecKvi  action  outcomes,  indicated  by  th«'  up- 


(a)  Cumulative  number  of  unexpected  action  outcomes  during  training 


(b)  Number  of  plans  successfully  executed  in  the  lest  set 

Figure  4:  Effectiveness  of  EXPO  in  the  robot  planning  domain  with  20% 
of  the  preconditions  missing  (i?pr«.c2o)-  (^)  Cumulative  number  of  unexpected 
action  outcomes  in  the  execution  of  solutions  to  training  problems  encountered 
by  EXPO  as  the  size  of  the  training  set  increases.  Each  one  presents  an 
opportunity  for  learning,  (b)  The  number  of  plans  successfully  executed  in  the 
test  set  increases  as  EXPO  learns.  The  number  of  additional  plans  successfully 
executed  is  indicative  of  the  amount  of  preconditions  acquired  by  EXPO. 


per  limit  of  the  y-axis,  corresponds  to  learning  all  the  missing  preconditions. 
For  /5prec20’  fh^t  although  EXPO  does  not  acquire  all  the  missing  do¬ 

main  knowledge,  it  has  learned  the  knowledge  necessary  to  execute  successfully 
the  solutions  to  all  the  problems  in  the  test  set.  In  fact,  after  training  with  40 
problems  EXPO  can  solve  all  the  problems  in  the  test  set.  Even  though  EXPO 
learns  new  conditions  with  further  training  they  do  not  cause  any  improvement 
in  the  performance,  fmr  very  few  solutions  to  the  test  problems  are 

executed  successfully  in  one  case.  This  is  because  llie  situalit)ns  encountered 


(a)  Cumulative  number  of  unexpected  action  outcomes  during  training 
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(b)  Number  of  plans  successfully  executed  in  the  test  set 

Figure  5;  Effectiveness  of  EXPO  in  the  robot  planning  domain  with  50%  of 
the  preconditions  missing 

during  training  do  not  cover  the  situations  encountered  in  the  lest  problems 
in  that  the  knowledge  needed  to  solve  the  test  problems  is  not  needed  to  solve 
the  training  problems.  (In  fact,  after  training  witii  the  test  set  one  more  new 
condition  is  learned  which  turns  out  to  be  common  in  the  test  set  and  thus 
the  solutions  to  all  the  test  problems  can  be  successfully  executed). 

In  the  process  planning  domain,  the  tests  were  run  in  domains  with  10%  and 
.50%  incompleteness  using  two  training  sets  and  two  test  sets.  Figures  6  and  7 
present  results  for  and  respectively  when  EXPO  acquires  new 

preconditions.  Even  though  this  is  a  more  complex  domain,  the  curves  show 
results  vc'ry  similar  to  the  results  obtained  for  the  robot  planning  (.lomain. 

We  also  ran  tests  with  domains  where  postconditions  of  operators  were  miss 
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(a)  Cumulative  number  of  unexpected  action  outcomes  during  training 


Figure  6:  Effectiveness  of  EXPO  in  the  process  planning  domain  with  10%  of 
the  preconditions  missing  Two  training  sets  and  two  test  sets  were 

used. 


ing.  Figures  8  and  9  show  the  results  for  and  respectively  in 

the  robot  planning  domain.  As  more  incorrect  literals  are  found  in  the  state. 
EXPO  acquires  new  effects  of  operators.  Thus,  the  number  of  incorrectly 
predicted  literals  when  running  the  test  set  is  reduced  continuously. 

4.2  Discussion 

The  new  preconditions  and  postconditions  learned  through  EXPO  improve 
prodigy’s  performance  by  reducing  the  amount  of  wrong  predictions  during 
plan  execution.  The  effectivene.s.s  of  learning  is  not  solely  a  characteristic  of 
the  learning  algorithm:  it  is  heavily  dependent  on  the  situations  |)rcsented 
to  EXPO  during  training.  If  the  training  problems  cover  sittiations  that  are 
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(a)  Cumulative  number  of  unexpected  action  outcomes  during  training 
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Figure  7:  Effectiveness  of  EXPO  in  the  process  planning  domain  with  30%  of 
the  preconditions  missing  (Z^precso)-  training  sets  and  two  test  sets  were 
used. 


comparable  to  the  ones  in  the  test  problems,  then  learning  is  more  effective. 
Notice  that  this  is  expected  of  any  learning  system. 

Another  effect  of  the  nature  of  the  training  problems  is  that  EXPO  rarely  ac¬ 
quires  all  the  knowledge  that  is  missing  from  the  domain.  However,  PRODIGY’s 
performance  is  always  improved,  and  in  many  cases  all  the  test  problems  can 
be  executed  successfully  after  learning  even  though  the  improved  domain  may 
not  be  complete.  EXPO  is  becoming  increasingly  more  correct,  because  learn¬ 
ing  is  directed  to  find  the  missing  knowledge  needed  to  solve  the  task  at  hand. 
Even  though  an  action  may  have  many  more  conditions  and  effects  than  those 
currently  known,  only  the  ones  that  are  relevant  to  the  current  situation  are 
acquired.  l-iXPO  shcjws  that  learning  can  improve  a  system's  performance 


(a)  Cumulative  number  of  incorrect  literals  found  during  training 


T)'4ininq  Problems 

(b)  Incorrect  literals  in  the  final  state  of  test  problems 

Figure  8:  Acquisition  of  new  effects  in  the  robot  planning  domain  with  20%  of 
the  effects  missing  (£^pojt2o)-  (^)  Cumulative  number  of  incorrect  literals  found 
in  the  internal  state  during  the  execution  of  training  problems  as  the  size  of 
the  training  set  increases.  Each  one  presents  an  opportunity  for  learning,  (b) 
The  number  of  incorrect  literals  of  the  final  state  in  the  test  set  decreases  as 
EXPO  learns.  This  is  indicative  of  the  amount  of  new  effects  of  operators 
acquired  by  EXPO. 

and  bring  it  to  a  point  where  it  can  function  reasonably  well  with  whatever 
knowledge  is  available,  be  it  a  perfect  model  of  the  world  or  not. 

Finally,  EXPO  is  a  proactive  learning  system.  When  a  fault  in  the  current 
knowledge  is  detected,  the  information  available  to  the  learner  may  well  be 
insufficient  for  overcoming  llie  fault.  An  important  component  of  EXPO’s 
learning  is  the  ability  to  design  experiments  to  gather  any  additional  informa¬ 
tion  needed  to  acquire  the  missing  knowledge.  Work  on  learning  theory  has 
shown  that  the  active  participation  of  the  learner  in  selecting  the  situations 
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(a)  Cumulative  number  of  incorrect  literals  found  during  training 


(b)  Incorrect  literals  in  the  final  state  of  test  problems 

Figure  9:  Acquisition  of  new  elTects  in  the  robot  planning  domain  with  50% 
of  the  effects  missing 

that  it  is  exposed  to  is  an  important  consideration  for  the  design  of  effective 
learning  systems  [Angluin,  1987]. 

5  Conclusions 

Learning  from  the  environment  is  a  necessary  capability  of  autonomous  intel¬ 
ligent  agents  that  must  solve  tasks  in  the  real  world.  Our  approach  combines 
selective  and  continuous  monitoring  of  the  environment  to  detect  knowledge 
faults  with  directed  manipulation  through  experiments  that  lead  to  the  miss¬ 
ing  knowledge.  The  results  presented  in  this  paper  show  the  effectiveness  of 
this  approacli  to  improve  a  planner's  prediction  accuracy  and  to  reduce  the 
amount  of  unreliable  action  outcomes  in  several  domains  through  the  acquisi¬ 
tion  of  new  preconditions  and  effects  of  operators. 
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This  work  is  applicable  to  a  wide  range  of  planning  task,  but  there  are  some 
limitations.  The  state  of  the  world  must  be  describable  with  discrete-valued 
features,  and  reliable  observations  must  be  available  on  demand.  .Actions  must 
be  axiomatizable  as  deterministic  operators  in  terms  of  those  features. 

Our  work  assumes  an  initially  incomplete  knowledge  base.  Future  work 
is  needed  to  address  other  types  of  imperfections,  including  incompleteness, 
incorrectness,  and  intractability  of  planning  domain  knowledge. 
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